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Abstract. Recently developed adaptive Markov chain Monte Carlo (MCMC) 
methods have been applied successfully to many problems in Bayesian statistics. 
Grapham is a new open source implementation covering several such methods, with 
emphasis on graphical models for directed acyclic graphs. The implemented algo- 
rithms include the seminal Adaptive Metropolis algorithm adjusting the proposal 
covariance according to the history of the chain and a Metropolis algorithm adjust- 
ing the proposal scale based on the observed acceptance probability. Different vari- 
ants of the algorithms allow one, for example, to use these two algorithms together, 
employ delayed rejection and adjust several parameters of the algorithms. The im- 
plemented Metropolis- within- Gibbs update allows arbitrary sampling blocks. The 
software is written in C and uses a simple extension language Lua in configuration. 



1. Introduction 

Markov chain Monte Carlo (MCMC) is a general framework for computing ex- 
pectations over complicated distributions in general state spaces. The methods 
are based on constructing a Markov chain (X„)„>i so that the ergodic averages 
Im = '^k=i fi-^k) converge to / = / /(x)7r(x)dx as ^ oo, where vr is the 
target distribution of interest. Such a chain is o ften easy to construct usi ng the 
Metropolis-Hastings algorithm; see, for example, iRobert and Casella fllQQOl ). De- 
pending on 71, however, it may be difficult to design a practical algorithm so that In 
would approximate I well with a moderate number of samples A^. 

Recently proposed adaptive MCMC algorithms adjust the parameters of the algo- 
rithm (the proposal distribution) on-the-fly, aiming to allow effici ent simulation . They 
have attracted increasing attention in the last few years, after iHaario et al.l (2001 ) 
presen ted the seminal Adaptive Metropolis (AM) algorithm, and lAndrieu and Robert 
( 120011 ) related adaptive MCMC to the general context of the Robbins-Monro stochas- 
tic approximation. After that, several authors have proposed new algorithms and 



variations, and provided theoretical validation of the method s (^Haa rio et al. . 2005 
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Grapham is an open source implementation of several adaptive MCMC algorithms 
based on the random walk Metropolis sampler. The purpose of Grapham is to provide 
an experimental tool for evaluating the performance of such algorithms with practical 
problems, especially in Bayesian statistics. The source code of the software and addi- 



ti onal docuraentati on are available for downloading in http : / / iki . f i/mvihola/grapham/'. 
Rosenthal ( 200?! ) describes another adaptive MCMC implementation: AMCMC. It 



is an R interface to one adaptive MCMC algorithm (referred to as 'ASCM' in Section 
[2] below). Grapham differs from AMCMC in that it relies on a hierarchical model 
specification and provides more alternative algorithms. Unlike AMCMC, Grapham 
also provides a set of ready-made standard distribution functions the user can employ 
as a part of their model specification. This is intended to allow faster development 
while permitting the user to define arbitrary distributions easily. 

The models are specified in Grapham by defining a set of variables with their 
conditional distr ibutions. Such m odels are often referred to as 'graphical models'; 
see, for example. iLauritzeru (119961 ) and references therein. This underly ing philosophy 



of Gra pham reminds that of BUGS (jSpiegelhalter et al.l . ll996-2008[ ): see also the 



Murphyl ( 2007 ) of other software for graphical models. The advantage of 



review 

Grapham over BUGS is that the adaptive MCMC algorithms can be much more 
efficient than the non-adaptive (Metropolis- within-) Gibbs algorithms of BUGS. One 
should, however, notice that Grapham is an experimental project not offering the 
versatility and maturity of BUGS. It is also likely that BUGS performs better than 
Grapham with many simpler models. 



2. Algorithms 

The general form of the algorithms implemented in Grapham can be described as 
follows. Let Xq = xo G M'^ be a given starting point for the state chain, and 6q 
and Lq stand for the initial scaling parameter and the (lower-diagonal with non-zero 
diagonal) shape matrix, respectively. For n > 1, the recursion follows: 

(51) form a proposal F„ = + 9n-iLn-iWn, where Wn is an independent sample 
from a symmetric proposal distribution q^, 

(52) with probability a„ = min{l, 7r(y„)/7r(X„_i)}, the proposal is accepted and 
Xn = Yn, otherwise, the proposal is rejected and X„ = X„_i, and 

(53) update the scaling parameter On-i — 6*^ > and the shape L„_i Ln G M.'^^'^ 
according to the selected adaptive algorithm. 

The steps (91]) and (32]) implement an iteration of the random-walk Metropolis al- 
gorithm with the proposal distribution go scaled by the factor Step (92]) 
implements the adaptation, changing the scahng parameters 9n and L„ based on the 
history of the chain. Examples of such updates are given below. 

Instead of applying the iteration (3I])^(33]) at once to all the elements of the vec- 
tor Xn, one may use Metropolis-within-Gibbs and apply the iteration sequentially to 
su bsets of the el e ment s of X„, as in the single component AM algorithm suggested 
by Haario et al.l ( 20051 ) . These sampling blocks can be selected freely in Grapham. 
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The proposal distribution go in (31]) can also be chosen. Grapham currently imple- 
ments (multivariate) Gaussian, student, uniform (in a cube) and (a cZ-fold product 
of) Laplace proposal distributions. 

The adaptation of ( ^ depends on the se lected algorithm. The Adaptive Metrop- 
olis (AM) algorithm of iHaario et al.l ( 200ll ) implies constant scaling 9n = Oq for all 
n > 1. The shape matrix L„ is the Cholesky factor of a covariance estimate of the 
chain. In particular, L„L^ = C„ with a positive definite Cq G M.'^^'^ and defined 
through 



(1) 
(2) 



Cn 



(1 - r]n)Mn-i + r]nXn and 
(1 - r/„)C„_i + T]n{Xn - M„_i)(X„ - M, 



n—1. 



with Mo = xq. The weight sequence rjn G (0, 1) can be selected arbitrarily, but it is 
recommended to choose //„ decaying to zero. For example, setting rjn = rjo > for 
all n > 1 results in an algorithm similar to the Adaptive Proposal (AP) algorithm 



(IHaario et al.l . 119991 ) . This algori t hm do es not, in general, provide valid simulation; 



see the example in IHaario et al 
default value rjn = {n + 



((20011). 



The original AM algorithm employs the 
in which case M„ and C„ coincide with the average 



and (asymptotically) the sample covariance of Xq, . . . , X„, respectively. The updated 



Cholesky factor L 

requiring 0{(P) operations (iDongarra et al. 



1 of C„,4 -i is computed efficieii tly from L„ by a rank one update 

19791).^ Observe that the same order of 



operations is needed when forming the proposal Yn in (31]) • 

The adap tive scaling Metropolis (ASCM ) algor ithm as proposed by lAtchade and Rosenthal 
( I2OO5I ) and [Roberts and Rosenthall tooj . l2009l ) leaves the shape matrix constant 
Ln = Lq for all n > 1. The scaling parameter On is updated according to the ob- 
served acceptance probability. The default update in Grapham is 



(3) 



6„ = 6, 



n-l 



a* 



where a* is the desired acceptance probability. The default values for a* ar e 0.44 
in dimension one and 0.234 otherwise following iRoberts and Rosenthal! (l2009f ). The 
user can also supply an alternative, arbitrary update function easily, as exemplified 
in Section m 

These two algor i thms, AM and ASCM, can be used sira ultaneously, as suggested in 
Atchade and Fort ( 2008 ) and Andrieu and Thoms ( 20081) . Additional flavours to the 
algorithms include a Rao-Blackwellised version of AM (lAndrieu and Thomd . l2008l ) 
modifying the update formulae ([T]) and ^ to 



(4) Mn = {l-r]n)Mn-i+r]n[anYn + {l-an)Xn-i] and 

(5) Cn = {1 - Vn)Cn-l + Vn[an{Yn - Mn-l){Yn - Mn-l)^ 

+ {l-an){Xn-l- Mn-l){Xn-l 



M 



n—1 J 



Ther e is a possibility to use (two-stage) delayed rejection (DR) with AM (IHaario et al. 



20061 ). DR can also be applied when using ASCM, so that only the first-stage accep- 
tance probability a„ is employed in ([3]). 
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Grapham implements three different burn-in strategies for adaptation. The default 
'greedy' strategy performs continuo us adaptation during the whole MCMC run. The 
'traditional' strategy as proposed in Haario et al.l ( 2001 ) uses a fixed proposal for the 
burn-in and then performs continuous adaptation during the rest of the simulation. 
One may also apply a 'freeze' strategy adapting only during the burn-in and keeping 
the obtained parameters fixed during the estimation run. 

It is possible t o employ a mixture of two pro posal density components, a fixed and 
an adaptive one ( Roberts and Rosenthal . 20091 ). This is implemented in Grapham so 
that, with probabihty Pmix, the initial parameters Lq and 6*0 are used in instead 
of the adapted values ft„_i and Ln-i- The user may define also a non-constant 
mixing probability p^^^ G [0, 1]. This feature can be used, for example, to introduce 
a 'gradual burn-in,' by defining a decaying sequence pf^j^ ~^ 0- 



3. Implementation 



Grapham does not have an interactive 'user interface.' It is simply executed 
from the command prompt (shell) with input file names as parameters. The in- 
put files contain the model specification and the simulation parameters. It is also 
possible to define the functional of interest in the input files. For more compli- 
cated functionals, however, it may be convenient to store (a subset of) the samples 
simulated by Grapham and process them in another environment. The samples 
can be saved into a file in the CSV (comma separated values) format or in a sim- 
ple binary format. The former allows the samples to be easily imported to many 
other enviro nments. There are ready-made f unctions for loading the binary data 
files into R (IR Development Core Teaml. 120091 ). Matlab® (The MathWorks, Natick, 
Massachusetts) and Octave (lEatonl . 120021 ) environments. 

The core of Grapham is implemented in C. It includes s ome numerical Fortran 
subroutines from the Netlib repository (IBrowne et al.l . 19951) and can optionally be 
compiled with the dSFMT random number generator o f ISaito and Matsumotol (l2008f ) 
instead of using the random number generators provided by the C standard libraries. 
The configurat ion of Grapham is done usin g the small and publicly available extension 



language Lua ( lerusalimschy et al. . 19961 ). While minimalistic, Lua is in fact a full- 



featured programming language offering a great flexibility. For example, the user can 
supply functions as configuration parameters and apply data from external files in 
the model. In fact, Grapham includes some functions written in Lua, for exampl e for 
reading data files in the CSV format. The Numeric Lua package (jCarvalhd . l2005l ) can 
also be compiled with Grapham to allow easy working with vector-valued variables. 

There are numerous ready-made distribution functions available for defining the 
conditional densities associated with the variables. The densities can also be defined 
arbitrarily as Lua functions. Likewise, the functional of interest may be written in 
Lua. However, to allow optimal performance, Grapham allows the user to supply 
densities and functionals in a separate C library with ease. 
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Figure 1. The graphical representation of the baseball model. The 
nodes with observed values ('data') are shown in grey. 



4. An Example Session 



Con sider the basebal l model of lRosenthall (119961 ) used as an example also with AM- 
CMC (IRosenthall . 120071 ) . It consists of 38 real- valued variables, defined hierarchically 
as depicted in Fig. [TJ The file specifying this model in Grapham is shown in Fig. [2J 
The model is defined in the Lua table model, defined in lines 4-20. Each variable 
is defined by an entry containing a logarithmic density, conditional on the parent 
variables. The variables /i, t and y in the example have standard distributions: /i has 
(an improper) uniform distribution over M, while t and y are conditionally Gaussian 
with means n and t and variances a and v, respectively. The reciprocal of the variable 
a is exponentially distributed; this is defined through a Lua function defined in lines 
16-18, calling dexp, the exponential distribution function. The model is, in fact, then 
modified by the function repeat_block. The block of variables [y, t) in the model 
is replicated 18 times to obtain blocks (yi,ti), . . . , (l/isj^is)- At the same time, the 
function repeat_block sets the values of yi to the 18 values read from the CSV file 
baseball .data using the function read_csv. 

The following shows an example run of Grapham with the model specification of 
Fig.H 

$ ./grapham models/baseball . lua 

Functional average = [ 0.392507 0.267393 0.318917 ] 

Acceptance rates: ( a ): 44.03% ( t7 ) : 43.97% ( t9 ) : 44.00% 

The part of the output shown above contains the computed estimate of the expected 
value of the functional specified in lines 23-25 of Fig. [21 that is, the r nean of the 
vecto r a], giving a similar estimate for ti as obtained by AMCMC (IRosenthall . 
20071 ). Moreover, the average acceptance rate of the nodes was approximately 44%, 
which is the default value of the desired acceptance probability a*. The run consisted 
of 40000 (of which 10000 burn-in) iterations using the ASCM algorithm for each real- 
valued variable at a time. This algorithm is very similar to the one implemented 
in AMCMC. The running time of Grapham was approximately 1.0 seconds with 
Intel Pentium 4 at 2.80GHz. As a comparison, the same run with AMCMC (with 
both the density and the functional specified in C for optimal performance) took 
approximately 3.8 seconds. The faster simulation speed of Grapham is explained by 
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1 


const = { 


2 


V = 0.00434 


3 


> 


4 


model = { 


5 


mu = { 


6 


density = "duniform" 


7 


}. 


8 


t = { 


9 


parents = {"mu" , "a"} , density = "dnorm" 


10 


}. 


11 


y = i 


12 


parents = {"t", "v"}, density = "dnorm" 


13 


}. 


14 


a = { 


15 


init_val = 1, 


16 


density = function(a_) 


17 


return dexp(l/a_, 1/2) 


18 


end 


19 


}, 


20 


> 


21 


_, y = read_csv( "models/baseball. data") 


22 


repeat_block({"y" , "t">, y [1] ) 


23 


function functional () 


24 


return {tl, mu, a} 


25 


end 


26 


para = { 


27 


niter = 30000, nburn = 10000, algorithm = "ascm" 


28 


> 



Figure 2. The Lua code in the file models/baseball . lua specifying 
the model of Fig. [1] in Grapham. 



the hierarchical model setup, which Grapham can take advantage of. That is, only 
part of the conditional densities in the target distribution need to be evaluated when 
each variable is updated. 

Let us modify the above example, by adding the lines shown in Fig. [3] to the 
model specification of Fig. [2l The supplied function para. scaling_adapt replaces 
the default update in (pi), and iii fact i mplements exactly the scaling adaptation 
algorithm of AMCMC fEosenthaj . l2007l : [Roberts and Rosenthal l2009h . The value 
set to the parameter para.dr means that delayed rejection is used, with a 0.1 times 
down-scaled proposal in the second stage. Moreover, instead of the default Gaussian 
distribution, the proposal samples are drawn from a Student's t-distribution. 

$ ./grapham models/baseball . lua models/amcmc_dr.lua 
Functional average = [ 0.392465 0.266204 0.321466 ] 
Acceptance rates: ( a ): 70.26% (47 . 49yo/22 . 777o) ( t7 ) : 70.667. 
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1 



para. scaling_adapt = function(sc, alpha, dim, k) 



2 



if alpha>0.44 then 



4 



3 



delta = 1 
else 



6 



5 



delta = -1 

end 



9 



8 



7 



return sc*exp(delta*min(0 . 01 , l/sqrt(k+l))) 

end 

para.dr = 0.1; para. proposal = "student" 



Figure 3. The Lua code in the file models/aiiiciiic_dr . lua. 



(47.55y./23.10y.) ( t9 ): 71.08% (47.62%/23.46y.) 

In this case, the total acceptance rate of each block is around 70%, of which roughly 
two thirds are accepted in the first stage and one third in the second, delayed rejection 
stage. The estimates obtained for ti, n and a appear similar to the first run. 

Finally, to exemplify how the data simulated by Grapham can be used in other 
environments, let us run Grapham with the command line 

$ ./grapham models/baseball . lua -e "para. outf ile='bb. bin' " 

This command includes the chuck of Lua code para . outf ile= ' bb . bin ' after reading 
the file baseball . lua. As a consequence, the simulated samples are written in the 
file bb . bin. In R, one could, for example, write 

> source (" tools/grapham_read. r") 

> data <- grapham_read("bb.bin" , nthin=10) 

> plot(data$a, data$tl) 

which would plot every tenth of the 30000 simulated samples of (a, ti) in the same 



Grapham provides a flexible open-source test bed for evaluating the performance 
of different adaptive random walk Metropolis algorithms, especially with hierarchi- 
cal models often encountered in Bayesian statistics. It provides a fairly simple and 
general way of determining models and functionals and for incorporating data into 
the model. The simulation speed of Grapham is good, even in a relatively high- 
dimensional setting, as the implemented algorithms involve at most a quadratic 
number of operations with respect to the dimension. The user has extensive con- 
trol over the various parameters of the algorithms, enabling a thorough testing of 
different adaptation strategies. Moreover, new adaptive algorithms of the similar 
random walk type can be easily added to Grapham. 



figure. 



5. Conclusions 
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